Over the last couple of years, FPGAs have made amazing advances in performance and capacity. Solutions, such as physical synthesis, are now needed that allow designers to take advantage of these improvements. Given the advancements in FPGAs and high levels of performance, past techniques are failing to deliver a design that can close on timing. In many leading-edge designs the traditional approaches are not enough; hence the need for the synthesis algorithms that use physical data to improve the results.
This article discusses the gap between today's tool capabilities and the needed successful timing closure of designs. A call is put out for synthesis algorithms that use physical data to improve the timing closure results.
Timing closure is not simple to achieve. And the solution needs to be robust, yet easy-to-use and accessible to the FPGA designer. ASICs and FPGAs require different implementation strategies, yet both have many things in common. Both use leading-edge technologies, currently 0.13 micron. Both are typically designed with VHDL or Verilog. Both are not shipped until timing is achieved. And finally, both have the ever-increasing challenge of timing closure for designers.
Many EDA vendors are trying to solve the challenge of achieving timing closure in a deep-submicron ASIC. The migration of ASIC designers to FPGAs and the advancements in FPGA architectures are also causing FPGA tool vendors to look seriously at this same challenge. The timing closure challenge in FPGAs is driven by the increase in complexity and performance of the current generation of FPGA devices. Ensuring that performance is achieved with these complex devices-which contain advanced building blocks such as DSP structures, high-speed I/O interfaces, specialized memories and built-in processors-requires a new approach to FPGA synthesis, one aimed specifically at achieving timing closure. The biggest challenge to timing closure is the gap between the physical and logical worlds. For performance reasons, the current generation of synthesis algorithms estimates interconnect delay with wire-load models based on fan-out. With the previous generation of programmable devices and designs, this approach was sufficient because the gate delay accounted for the majority of the delay. With the new generation of devices, this is all changing. In these new larger and faster devices, interconnect delay can account for up to 70 percent of the total delay. To achieve the best synthesis results, interconnect delay must be accurately modeled; hence the concept of physical synthesis. To make physical synthesis efficient, the synthesis database must seamlessly contain both the logical and physical data.
The term physical synthesis has market appeal. It covers anything remotely connected with using
physical data to improve synthesis results. The term means different things to different people and companies. It is helpful to remember that synthesis is a generic term that spans the synthesis or conversion of register-transfer level (RTL) to logic and the optimization of that logic. In its most generic form, physical synthesis combines placement with synthesis algorithms to produce a better quality of results (QoR). This can be achieved in two general areas.
The first method fits the pure definition of physical synthesis because interconnect delays from placement are used during synthesis to make architectural improvements. This level of change is usually done in conjunction with design planning. The advantage is that the changes are done before technology mapping.
The second area grouped under the physical synthesis banner can also be referred to as physical optimization. The focus is to use physical data to improve the optimization routines. Today's FPGA physical synthesis tools fit mostly under physical optimization. This will change as complexity continues to increase and designers move from floor planning to design planning.
Two constants characterize timing closure: the challenge varies from design to design, and designers want to get their product to market quickly with the least amount of work.
To address how timing closure problems can vary between designs and time-to-market pressures, the tool must provide a number of alternative solutions. It is important that the solution first takes an automated approach. However, for the extremely difficult timing closure problems, designers must be able to interact with the tool and their designs in both the logical and physical worlds to achieve timing closure and ship their parts.
FPGA designers have long achieved timing closure even before EDA vendors started making noise about physical synthesis. Common approaches include rewriting the RTL, design iterations, working with the timing constraints and grouping cells. These approaches worked in most cases because the overall delay was driven by the cell delay and not by the interconnect delay.
The grouping of cells is an aid to the place-and-route engine. This can be done by floor planning or through the use of attributes in the RTL. The challenge with the grouping-of-cells approach is that it is more art than science. A nicely designed floor plan does not mean an effective grouping for the place and route. Many designers have been burned trying to balance routing congestion and the desire to keep timing critical signals together.
Physical results
When physical information is used in the synthesis process, it is called physical optimization. The first attempt to use physical data was to back annotate delays from the place-and-route system and reoptimize the design. But this approach lacked the understanding of the device and available resources to properly account for the changes made during synthesis. To be effective, physical optimization must understand the device, routing resources, delay calculation, packing rules, design rules and logic resources. The physical optimization techniques that have proved the most popular include retiming, replication and resynthesis.
Retiming balances the positive and negative slacks found throughout the design. Once the node slacks are calculated, registers are moved across logic to steal from positive slack and give to negative slacks. Retiming- commonly done as part of logic optimization because registers are plentiful in FPGA devices-improves performance and, in many cases, routability. Physical-aware retiming replaces the inaccurate wire load models with physical routing models for the interconnect. Since interconnect delay is a dominant factor in total delay, it is important to account for this value as retiming takes place.
Replication is especially effective in breaking up long interconnect. It can be performed at the logic level, but this is based on inaccurate wire-load models. Further improvement in the final results is achieved with the increased accuracy from the physical data. Because it only compounds the problem to add elements into a highly utilized area, the replication algorithm needs to understand what resources are available to determine which logic or registers are to be replicated.
Resynthesis uses the physical data to make local optimization to critical paths. This functionality crosses over from the physical optimization routines to the realm of physical synthesis. Resynthesis uses operations like logic restructuring, reclustering, substitution and/or possible elimination of gates and wires to improve timing and routability. Resynthesis can move logic from a critical path to a noncritical portion of the design. It is key that resynthesis works with incremental placement to understand what resources are available and how any changes impact congestion.
Proximity optimization
The common way to reduce long interconnect delay is to replicate some logic. But another important option is to move the elements closer together. During the development of the physical optimization algorithms it becomes obvious that optimizing the placement is a critical technique to improve performance. In fact, on some designs performing placement improvements will deliver as much or more of an increase in performance as making changes to the netlist. However, to address timing closure with incremental placement technology, it needs to work closely with the logic optimization routines.
The design must be routable before timing closure can be achieved. Therefore placement requires a balance between timing and congestion. The placement improvements discussed in this section are different from the initial placement that FPGA vendors are providing. During the last few years, the improvements in QoR from the FPGA vendors has been significant, but their approach focuses on balancing a global and a local view. That balancing act is made harder because the design is not optimized with physical data. This increases the number of critical paths and the problems that the placer must resolve. During physical optimization a surgical approach to placement is needed. As the netlist is optimized, the original placement will contain overlaps. Placement optimization needs to resolve the overlaps with minimal impact on the rest of the design. While this approach must be aware of the global view, at this point the tool has the luxury of working on a finite number of critical paths.
Physical synthesis retimes, replicates and resynthesizes the designs based on physical data. Incremental placement works to integrate the netlist changes while improving timing and reducing congestion. Placement optimization provides another important technique to solve timing problems.
It has been estimated that 95 percent of the time and resources are used on the last 5 percent of the design. Physical synthesis should provide as much automation as possible to solve the problem, but sometimes it is not going to be enough.
Designs that really push the performance levels sometimes need manual guidance to achieve the timing closure. As the importance of physical design continues to grow, a physical viewer will be a critical part of any circuit debug. In the past a designer would use the schematic view to improve performance. Now it is important to visualize the device to identify congestion and timing problems. For most FPGA designers a physical viewer is not in the standard design flow, but as designs complexity increases, it will become a critical part to achieve timing closure. Important functionality includes an interactive and incremental connection to timing analysis, fast graphics and intuitive editing commands.
The goal of the physical viewer is to help the user close on timing as quickly and easily as possible. A connection must be made between the timing reports and the physical data to accomplish this. If timing can be improved by moving an element (lookup tables, registers, block rams and carry chains, for example), designers then need the ability to make the change manually and to quickly verify that it worked and that it did not cause other problems. This can be achieved quickly if the tool directly connects with an incremental timing analysis. That allows the tool to recalculate only the timing on cones of logic affected by the move, providing nearly instantaneous feedback to improve productivity.
Traditional approaches like rewriting the RTL, working with the constraints or floor planning add value, but smarter synthesis algorithms that leverage physical data are required to help the designer keep pace with technology advancements and get their products to market faster. Automated algorithms such as retiming, replication and resynthesis add value especially when they are reinforced with placement optimization. After automation has done as much as possible, an interactive environment allows the user to close on those last few signals.
---
Jeff Wilson, physical synthesis product-marketing manager for Mentor Graphics' FPGA Synthesis Business Unit (Wilsonville, Ore.), holds a bachelor's degree in design engineering from Brigham Young and a master's degree in business administration from the University of Oregon.
http://www.isdmag.com
Copyright © 2002 CMP Media LLC
6/1/02, Issue # 14156, page 24.